For a frequentist, a confidence interval is an interval estimate that includes a a given % of our estimated sampling distribution (e.g. 90%, 95%, 89%)
Point Estimate: A single value estimate of a parameter (e.g., sample mean \(\bar{x}\)).
Interval Estimate: A range of values within which the parameter is expected to lie with a certain level of confidence.
Frequentist Confidence Intervals
The general form of a confidence interval is:
\[
\text{Point Estimate} \pm \text{Margin of Error}
\]
Interpretation of Confidence Intervals
Remember: for Frequentists, the parameters \(\theta\) are fixed, and samples of data\(X\) are random.
Frequentists always imagine what would happen if we took infinitely many random samples of size \(n\).
Interpretation of Confidence Intervals
Remember: for Frequentists, the parameters \(\theta\) are fixed, and samples of data\(X\) are random.
Frequentists always imagine what would happen if we took infinitely many random samples of size \(n\).
If we used these theoretical samples to calculate a CI, we’d expect that the long-run proportion of these intervals that contain the true population parameter \(\theta\) to be \(1-\alpha\) (the confidence level, e.g. 90%).
Interpretation of Confidence Intervals
Interpretation of Confidence Intervals
The confidence we have is in the procedure we use to generate the CIs: if the assumptions we’re making are true, \(1-\alpha\) of the CIs generated this way will contain \(\theta\).
We don’t know if our confidence interval is one of those \(1-\alpha\) CIs, but if we act as if it does contain \(\theta\), in the long-run we’ll be wrong only \(\alpha\) of the time.
Interpretation of Confidence Intervals
\((lower, upper)\) are all guesses for \(\theta\) that are reasonable based on the data we’ve seen.
\((lower, upper)\) are values of \(\theta\) that are compatible with the data we’ve seen.
We are \(1-\alpha \times 100\)% confident that \(\theta\) is between \((lower, upper)\)
Note on Frequentistism
You’ll see the phrase “act as if” a lot when I talk about Frequentist statistics. Frequentist methods allow you to define procedures that control for the long-run error rate.
If we act as if our CI contains \(\theta\), we only expect to be wrong \(\alpha\) of the time. We choose \(\alpha\)
\(\frac{\sigma}{\sqrt{n}}\) , the standard error, is the standard deviation of the sampling distribution.
\(z_{\alpha/2}\) is the z-scores of the \(\frac{\alpha}{2}\) quantile (e.g. when \(\alpha = 0.05\) we look for the z-scores at the \(0.025\) and \(0.975\) quantiles
\(z_{\alpha/2} \left( \frac{\sigma}{\sqrt{n}} \right)\) tells us how far away the upper and lower bounds are from the point estimate
Confidence Interval for the Mean (Known Variance)
\[
\alpha = 0.05; z_{\alpha/2} = 1.96
\]
Confidence Interval for the Mean (Unknown Variance)
When the population variance is unknown and the sample size is small, we use the t-distribution:
\(t_{\alpha/2, n-1}\): Critical value from the t-distribution with \(n-1\) degrees of freedom
Confidence Interval for the Mean (Unknown Variance)
Remember: \(t\) distributions have heavy-tails. When we’re using an estimate for the standard deviation, so we’re more uncertain about our estimates because there are two sources of uncertainty:
uncertainty about the sample mean \(\bar{x}\)
uncertainty about the sample standard deviation \(s\)
the \(t\) distribution better represents this added uncertainty
Confidence Interval for the Mean (Unknown Variance)
Confidence Interval for the Mean (Unknown Variance)
Example: Confidence Interval for a Mean
Suppose we have the following data representing the weights (in kg) of a sample of 20 gorillas that get an extra 🍌 each day.
Other gorillas in the zoo have a mean weight of 65 kg. Is there any evidence that gorillas that get an extra 🍌 have a higher mean weight? Discuss.
Evaluating Confidence Intervals
A good confidence interval should:
have good coverage
be precise
Evaluating Confidence Intervals
Coverage
Nominal Coverage: \(1-\alpha\)
Actual Coverage: \(P(lb \leq \theta \leq ub)\)
Coverage should be at least \(1-\alpha\). We might want to test coverage under a range of situations (e.g. small sample size, skewed distributions..etc).
Many statistical properties are asymptotic, and we are never at the asymptote.
Evaluating Confidence Intervals
Precision
Many intervals where \(P(lb \leq \theta \leq ub) = 1-\alpha\)
Choose the narrowest one.
Evaluating Confidence Intervals
Precision
❓ Why would a narrower CI (assuming a constant \(1-\alpha\) confidence level) be useful?
Bootstrapped Confidence Intervals
Bootstrapping is a resampling technique that approximates a sampling distribution by sampling from a sample with replacement.
By sampling with replacement, we’re treating the sample as an approximate population, and sampling from it.
What is Bootstrapping?
Sample with replacement from the original data
Generates “new” samples (bootstrap samples) of the same size as the original dataset.
Calculates the statistic (e.g., mean, median) on each bootstrap sample.
ages <-c(18,27,27,19,25,23,26,23,20,23,19,17,21,19,18,18,21,18,20,25,23,17,23,18,23,25,23,25,19,27)boot_sampling_dist <-replicate(n =1000, # 1000 boot samplesexpr =mean(sample(ages, # sample from agessize =length(ages), # same sample sizereplace =TRUE))) # w replacement
Computationally Expensive: Requires a large number of resamples.
Sample Quality Dependent: Quality of results depends on the quality of original sample.
Edge Cases: May not perform well with very small sample sizes or extreme distributions.
Bootstrapping in R
library(boot)# fake dataset.seed(123)data <-rnorm(50, mean =5, sd =2)# statisticmean_sq_function <-function(data, indices) { sample_data <- data[indices]return(mean(sample_data)**2)}# bootstrapbootstrap_results <-boot(data, mean_sq_function, R =1000)# cici <-boot.ci(bootstrap_results, type ="basic")print(ci)
BOOTSTRAP CONFIDENCE INTERVAL CALCULATIONS
Based on 1000 bootstrap replicates
CALL :
boot.ci(boot.out = bootstrap_results, type = "basic")
Intervals :
Level Basic
95% (20.39, 30.47 )
Calculations and Intervals on Original Scale
Using Confidence Intervals
❓ when choosing a confidence level, how would you decide? Are there tradeoffs of choosing a high vs. low confidence level?
Confidence Interval Overlap
We have a depression score that ranges from \(-3 \to 3\). Based on clinical research, people’s QOL is noticeably changed if we can change their depression score by \(0.25\) in either direction.
After doing corgi therapy, the 95% CI for the change in people’s depression scores is:
\[
(-0.05, 0.35)
\]
❓if any change \(\pm 0.25\) is clinically irrelevant, what does this CI tell us about corgi therapy?
Read more here, but warning: 2GPV is not a widely accepted practice at the moment, however it shares a lot of ideas and properties with Equivalence Testing which we’ll discuss later.